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Abstract. A run is an inclusion maximal occurrence in a string (as a 
subinterval) of a repetition v with a period p such that 2p < \v\. The 
exponent of a run is defined as \v\/p and is > 2. We show new bounds on 
the maximal sum of exponents of runs in a string of length n. Our upper 
bound of 4.1 n is better than the best previously known proven bound of 
5.6 n by Crochemore & Hie (2008). The lower bound of 2.035 n, obtained 
using a family of binary words, contradicts the conjecture of Kolpakov & 
Kucherov (1999) that the maximal sum of exponents of runs in a string 
of length n is smaller than 2n. 



1 Introduction 

Repetitions and periodicities in strings are one of the fundamental topics in 
combinatorics on words [1, 14]. They are also important in other areas: lossless 
compression, word representation, computational biology, etc. In this paper we 
consider bounds on the sum of exponents of repetitions that a string of a given 
length may contain. In general, repetitions are studied also from other points 
of view, like: the classification of words (both finite and infinite) not containing 
repetitions of a given exponent, efficient identification of factors being repetitions 
of different types and computing the bounds on the number of various types of 
repetitions occurring in a string. The known results in the topic and a deeper 
description of the motivation can be found in a survey by Crochemore et al. [4] . 

The concept of runs (also called maximal repetitions) has been introduced 
to represent all repetitions in a string in a succinct manner. The crucial prop- 
erty of runs is that their maximal number in a string of length n (denoted as 
p(n)) is 0{n), see Kolpakov & Kucherov [10]. This fact is the cornerstone of any 
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algorithm computing all repetitions in strings of length n in 0{n) time. Due to 
the work of many people, much better bounds on p{n) have been obtained. The 
lower bound 0.927 n was first proved by Franek & Yang [7]. Afterwards, it was 
improved by Kusano et al. [13] to 0.944565 n employing computer experiments, 
and very recently by Simpson [18] to 0.944575712 n. On the other hand, the 
first explicit upper bound 5n was settled by Rytter [16], afterwards it was sys- 
tematically improved to 3.48 n by Puglisi ct al. [15], 3.44 n by Rytter [17], 1.6 n 
by Crochemore & Ilic [2,3] and 1.52 n by Giraud [8]. The best known result 
p{n) < 1.029 n is due to Crochemore et al. [5], but it is conjectured [10] that 
p{n) < n. Some results are known also for repetitions of exponent higher than 
2. For instance, the maximal number of cubic runs (maximal repetitions with 
exponent at least 3) in a string of length n (denoted Pcubic{n)) is known to be 
between 0.406 n and 0.5 n, see Crochemore et al. [6]. 

A stronger property of runs is that the maximal sum of their exponents in 
a string of length n (notation: cr(n)) is linear in terms of n, see Kolpakov & 
Kucherov [12]. It has applications to the analysis of various algorithms, such 
as computing branching tandem repeats: the linearity of the sum of exponents 
solves a conjecture of [9] concerning the linearity of the number of maximal 
tandem repeats and implies that all can be found in linear time. For other appli- 
cations, we refer to [12]. The; proof that cr(n) < cn in Kolpakov and Kucherov's 
paper [12] is very complex and does not provide any particular value for the con- 
stant c. A bound can be derived from the proof of Rytter [16] but he mentioned 
only that the bound that he obtains is "unsatisfactory" (it seems to be 25 n). 
The first explicit bound 5.6 n for a{n) was provided by Crochemore and Hie [3], 
who claim that it could be improved to 2.9 n employing computer experiments. 
As for the lower bound on a{n), no exact values were previously known and it 
was conjectured [11, 12] that a{n) < 2n. 

In this paper we provide an upper bound of 4.1 n on the maximal sum of 
exponents of runs in a string of length n and also a stronger upper bound of 
2.5 n for the maximal sum of exponents of cubic runs in a string of length n. As 
for the lower bound, we bring down the conjecture a(n) < 2n by providing an 
infinite family of binary strings for which the sum of exponents of runs is greater 
than 2.035 n. 



2 Preliminciries 

We consider words {strings) u over a finite alphabet S, u G E*; the empty 
word is denoted by e; the positions in u are numbered from 1 to \u\. For u = 

U1U2 ■ ■ ■ Mm, let US denote by u[i . . j] a factor of u equal to Ui . . . Uj (in particular 
u[i] = u[i . . i]). Words u[l . . i] are called prefixes of u, and words u[i . . \u\] suffixes 
of u. 

We say that an integer p is the (shortest) period of a word u = ui . . . Um 
(notation: p = per(u)) if p is the smallest positive integer such that Ui = Ui+p 
holds for all 1 < i < m — p. We say that words u and v are cyclically equivalent 



(or that one of them is a cycUc rotation of the other) if u = xy and v = yx for 

some X. y G S*. 

A run (also called a maximal repetition) in a string u is an interval [i . . j] 
such that: 

— the period p of the associated factor u[i . .j] satisfies 2p < j — i + 1, 

— the interval cannot be extended to the right nor to the left, without violating 
the above property, that is, u[i — 1] 7^ u[i +p — 1] and u[j —p+1] ^ u[j + 1]. 

A cubic run is a run [i . . j] for which the shortest period p satisfies 3p < j — i + 
For simplicity, in the rest of the text we sometimes refer to runs and cubic runs 
as to occurrences of the corresponding factors of u. The (fractional) exponent of 
a run is defined as {j — i + ^)/p- 

For a given word u € S*, we introduce the following notation: 

— p{u) and Pcubiciu) arc the numbers of runs and cubic runs in u resp. 

— a{u) and acubiciu-) are the sums of exponents of runs and cubic runs in u 
resp. 

For a non-negative integer n, we use the same notations p{n), Pcubic{n), a{n) 
and (Tcuhicin) to denote the maximal value of the respective function for a word 
of length n. 



3 Lower bound for cr(n) 

Tables 1 and 2 list the sums of exponents of runs for several words of two known 
families that contain very large number of runs: the words Xi defined by Franek 
and Yang [7] (giving the lower bound p{n) > 0.927 n, conjectured for some time 
to be optimal) and the modified Padovan words yi defined by Simpson [18] 
(giving the best known lower bound p{n) > 0.944575712 n). These values have 
been computed experimentally. They suggest that for the families of words Xi 
and yi the maximal sum of exponents could be less than 2n. 

We show, however, a lower bound for cr(n) that is greater than 2n. 

Theorem 1. There are infinitely many binary strings w such that 

a(w) 



> 2.035. 



w 



Proof. Let us define two morphisms 4> '■ {a, b, c} i->- {a, b, c} and tp : {a, b, c} i-> 
{0, 1} as follows: 

(p{a) = baaba, (p{b) = ca, (f){c) = bca 

V'(a) = 01011, V(&) = V'(c) = 01001011 
We define Wi = ip{(p^{a)). Table 3 shows the sums of exponents of runs in words 
Wi, computed experimentally. 

Clearly, for any word w = (ws)*^, fc > 1, we have 

a(w) 

> 2.035. 

\w\ 

□ 



i 




p{Xi)/\Xi\ 


o-{xi) 


a{xi)/\xi\ 


1 


6 


0.3333 


4.00 


0.6667 


2 


27 


0.7037 


39.18 


1.4510 


3 


116 


0.8534 


209.70 


1.8078 


4 


493 


0.9047 


954.27 


1.9356 


5 


2090 


0.9206 


4130.66 


1.9764 


6 


8855 


0.9252 


17608.48 


1.9885 


7 


37512 


0.9266 


74723.85 


1.9920 


8 


158905 


0.9269 


316690.85 


1.9930 


9 


673134 


0.9270 


1341701.95 


1.9932 



Table 1. Number of runs and sum of exponents of runs in Franek & Yang's [7] words 

Xi. 



i 


M 


p{yi)/M 






4 


37 


0.7568 


57.98 


1.5671 


8 


125 


0.8640 


225.75 


1.8060 


12 


380 


0.9079 


726.66 


1.9123 


16 


1172 


0.9309 


2303.21 


1.9652 


20 


3609 


0.9396 


7165.93 


1.9856 


24 


11114 


0.9427 


22148.78 


1.9929 


28 


34227 


0.9439 


68307.62 


1.9957 


32 


105405 


0.9443 


210467.18 


1.9967 


36 


324605 


0.9445 


648270.74 


1.9971 


40 


999G52 


0.9115 


199()r) 11.30 


1.9972 



Table 2. Number of runs and sum of exponents of runs in Simpson's [18] modified 
Padovan words 



i 


\Wi\ 


a{wi) 


a{wi)/\wi\ 


1 


31 


47.10 


1.5194 


2 


119 


222.26 


1.8677 


3 


461 


911.68 


1.9776 


4 


1751 


3533.34 


2.0179 


5 


6647 


13498.20 


2.0307 


6 


25205 


51264.37 


2.0339 


7 


95567 


194470.30 


2.0349 


8 


362327 


737393.11 


2.0352 


9 


1373693 


2795792.39 


2.0352 


10 


5208071 


10599765.15 


2.0353 



Table 3. Sums of exponents of runs in words Wi. 



4 Upper bounds for cr(n) and (Tcubid''^) 



In this section wc utilize the concept of handles of runs as defined in [6]. The 
original definition refers only to cubic runs, but here we extend it also to ordinary 
runs. 

Let u £ S* be a word of length n. Let us denote by P = {pi,p2, ■ ■ ■ ■Pn-i} 
the set of inter-positions in u that are located between pairs of consecutive letters 
of u. We define a function H assigning to each run v in u a. set of some inter- 
positions within V (called later on handles) — H is a mapping from the set of 
runs occurring in u to the set 2^ of subsets of P. Let v he a run with period 
p and let w be the prefix of v of length p. Let Wmin and t«max be the minimal 
and maximal words (in lexicographical order) cyclically equivalent to w. H(v) is 
defined as follows: 

a) if Wmin = '^^max thcn H {v) Contains all inter-positions within v, 

b) if Wmin 7^ Wmax then H (v) contains inter-positions between consecutive oc- 
currences of Wmin in V and between consecutive occurrences of Wmax in v. 

Note that H{v) can be empty for a non-cubic-run v. 



V, 




1 1 2 



Fig. 1. An example of a word with two highlighted runs vi and V2. For vi we have 
Wminl 7^ Wmaxl and for V2 the corresponding words are equal to h (a one-letter word). 
The inter-positions belonging to the sets H{vi) and H{v2) are pointed by arrows 



Proofs of the following properties of handles of runs can be found in [6]: 

1. Case (a) in the definition of H{v) implies that It^mml = 1- 

2. H{vi) n H{v2) = for any two distinct runs Vi and V2 in u. 

To prove the upper bound for (7{n), wc need to state an additional property 
of handles of runs. Let 7i{u) be the set of all runs in a word u, and let Tii{u) 
and 1Z>2 {u) be the sets of runs with period 1 and at least 2 respectively. 

Lemma 1. 

IfvG ni{u) then a{v) = \H{v)\ + 1. 
Ifv€ n>2{u) then \a{v)'] < 1^+3. 



Proof. For the case oiv &7ii (u), the proof is straightforward from the definition 
of handles. In the opposite case, it is sufficient to note that both words w^^^ and 
^max for k = [cr{v)} — 1 are factors of v, and thus 

\H{v)\>2.{laiv)\-2). 

a 

Now we are ready to prove the upper bound for a{n). In the proof we use 
the bound p{n) < 1.029 n on the number of runs from [5]. 

Theorem 2. The sum of the exponents of runs in a string of length n is less 
than 4.1 n. 

Proof. Let u be a word of length n. Using Lemma 1, we obtain: 

< Y mv)\ + i)+ E 

iieKi(M) J)67?->2(«) 

= Y mv)\ + \n^{u)\+ J2 ^ + 3-|7e>2WI 

uG7?.i(m) ii67?.>2(«) 

< 3-\TZ{u)\+A + B/2, (1) 

where A = E„eTCi(«) and B = E„eK>2(«) 1^(^)1- Due to the disjointness 

of handles of runs (the second property of handles), A + B < n, and thus, 
A + B/2 < n. Combining this with (1), we obtain: 

Y (^{v) < S-\']l{u)\+n < 3-p(n)+n < 3-1.029n + n < 4.1n. 

veTZ{u) 

□ 

A similar approach for cubic runs, this time using the bound of 0.5 n for 
Pcubicin) from [6], enables us to immediately provide a stronger upper bound for 
the function (Jcubidn). 

Theorem 3. The sum of the exponents of cubic runs in a string of length n is 
less than 2.5 n. 

Proof. Let m be a word of length n. Using same inequalities as in the proof of 
Theorem 2, we obtain: 

Y cr(^) < ^■\T^cubic{u)\+n < 3- Pcubicin) + n < 3- 0.5 n + n = 2.5 n, 

veTZcubiciu) 

where TZcubidu) denotes the set of all cubic runs of u. □ 
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